import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#!pip install plotly
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
Covid 19 had changed peoples' lives for one year long, and it had caused a heavy blow on many industries in Canada. Tourism was among the hardest-hit sectors of the economy in 2020 as a result of physical distancing measures to contain the spread of COVID-19. According to Statistics Canada, tourism spending was almost cut in half (-48.1%) in 2020. Tourism gross domestic product (GDP) was down 47.9% annually, and tourism-related jobs fell 28.7% annually in 2020, with most of the drop occurring in the second quarter. All tourism categories were down in 2020, with food and beverage services (-32.3%) and accommodation (-35.2%) contributing most to the overall decline. In this story, I am going to explore the covid-19 policy impact on Canada's traveling and related jobs, thus providing some suggestions for tourism workers.
Let us first see the policies Canada had made to limit international travelling in response of covid-19 in 2020. (Wikpedia,2020)
After reading about these policies, I wondered if these policies were effective in reducing imported cases. Therefore, I use the data from the COVID-19 Canada Open Data Working Group (ccodwg), with each row representing a unique positive case, including age, sex, health region location, and history of travel where available. I plotted two pie charts to compare the ratio of imported vs locally acquired cases before and after 3.21.2020. Before the border officially closed, the number of imported cases account for almost half of the total cases, after the closed, the ratio of imported cases had dramatically decreased to 21.6%, which illustrates that the policy about the closing border is effective and necessary. I then made a bar graph to explore the percentage of imported cases for different countries. The bar graph showed that the US accounted for the largest ratio(approximately 1/4) for imported cases of Canada, so it's a wise decision to close Canada and the US shared border.
# loading this csv file may takes a minute
df_cases=pd.read_csv("https://raw.githubusercontent.com/ccodwg/Covid19Canada/master/individual_level/cases_2020.csv")
df_cases['date_report'] = pd.to_datetime(df_cases['date_report']) ## unify the date format
df_cases = df_cases.sort_values(by='date_report')
df_cases = df_cases.reset_index()
df_before = df_cases[df_cases['date_report'] <= '2020-03-21'] ## divide data into two periods
df_after = df_cases[df_cases['date_report'] > '2020-03-21']
plt.figure(figsize=(14,6.5))
plt.suptitle("Ratio of imported vs locally acquired cases before and after 3.21.2020")
plt.subplot(1,2,1)
plt.title("Imported vs Locally Acquired (1.1 - 3.21)")
label=["Imported Cases", 'Locally Acquired']
x=[df_before.travel_yn.value_counts()['1'],df_before.travel_yn.value_counts()['0']]
plt.pie(x, labels=label, autopct='%1.1f%%')
plt.subplot(1,2,2)
plt.title("Imported vs Locally Acquired (3.22 - 12.31)")
label=["Imported Cases", 'Locally Acquired']
x=[df_after.travel_yn.value_counts()['1'],df_after.travel_yn.value_counts()['0']]
plt.pie(x, labels=label, autopct='%1.1f%%')
plt.show()
plt.figure(figsize=(14,7))
imported = df_cases[df_cases.travel_history_country!='Not Reported'].travel_history_country.value_counts()
x=list(imported[imported>10] / imported.sum())
y=list(imported[imported>10].index)
ax = sns.barplot(x, y, orient='h')
plt.title("Imported cases from different countries")
plt.xlabel('Percentage')
plt.show()
After verifying the effectiveness of border-close policies, I was then curious about traveler change trends in response to policies to see if I could find any interesting insights. The dataset I used is "International travelers entering or returning to Canada, by type of transport" from Statistics Canada, which contains information about the number of travelers for three categories monthly(https://doi.org/10.25318/2410004101-eng). I first make a line plot to check the overall trend and then plot a bar graph to check the trend for different categories of travelers. From the line graph, we could see a dramatic decrease in the number of travelers from early March and stay low afterward, which is in correspondence with the close policy. The travelers are divided into three categories(non-resident travelers, Canadian residents, and other travelers) to build the bar graph. International travelers are made up of non-resident travelers and other travelers, and other travelers are mainly made up of truck drivers and crew. From the bar graph, we can find that all three categories of travelers had been decreased since March, but the number of truck drivers and crew recovered quickly because they played a key role in the supply chain, such as keeping stocks on grocery store shelves during the lockdown. Canadians also increased online consumption during the pandemic. In November alone, e-commerce sales increased by three-quarters (+75.9%) year-on-year (J.P. Morgan 2020 E-commerce Payments Trends Report). Most of the goods are brought in by truck. So tourist bus drivers who lose jobs due to covid-19 could consider changing career to the truck driver. Also, e-commerce is a good career option for unemployed people.
df_border=pd.read_csv("https://www150.statcan.gc.ca/t1/tbl1/en/dtl!downloadDbLoadingData-nonTraduit.action?pid=2410004101&latestN=0&startDate=20200101&endDate=20201201&csvLocale=en&selectedMembers=%5B%5B1%5D%2C%5B1%2C2%2C45%2C86%5D%5D")
plt.figure(figsize=(14,6))
df_total = df_border[df_border['Traveller characteristics'] == 'Total international travellers']
df_total['REF_DATE'] = pd.to_datetime(df_total['REF_DATE'])
plt.plot(df_total['REF_DATE'], df_total['VALUE'])
plt.title("International travellers entering or returning to Canada")
plt.xlabel('Month')
plt.ylabel('Number of travellers')
plt.show()
plt.figure(figsize=(14,6.5))
sns.barplot(data = df_border[df_border['Traveller characteristics'] != 'Total international travellers'], x = 'REF_DATE', y = 'VALUE', hue = 'Traveller characteristics')
plt.title("Different types of travellers entering or returning to Canada")
plt.xlabel('Month')
plt.ylabel('Number of travellers')
plt.show()
After the overview of the effect of policies and international traveling trends, I want to explore the tourism industry in Canada before covid-19 and learn about the covid-19 effects on this industry. The Provincial and Territorial Tourism Satellite Account (PTTSA) provides an economic measure of the importance of tourism in terms of expenditures, gross domestic product, and employment for each of the provinces and territories. The dataset "Provincial and territorial gross domestic product (GDP) and employment generated by tourism and related measures"(https://doi.org/10.25318/2410004201-eng) from PTTSA is used for creating visualizations. It contains GDP and employment data for different activities in tourism industries for different provinces. Gross domestic product (GDP) is the total monetary or market value of all the finished goods and services produced within a country's borders in a specific time period(Wikpedia,2020).
The first graph is made to investigate tourism GDP and employment in different provinces. The x-axis stands for the number of jobs(x 1,000) the tourism industry provided, the Y-axis stands for the ratio of tourism jobs over total jobs, and the size of the bubble represents the tourism-related GDP(x 1,000,000) provided by the tourism industry. From the graph, we can see that Ontario has the highest tourism GDP and provides the highest number of jobs. However, the share of total employment of Ontario is very low (3.2%). Yukon relies on the travel and tourism industry for more than 7.4% of their total share of employment. It may suffer the most economic damage during covid 19.
The second graph is a two-level pie chart, which shows the ratio of GDP of 6 different categories and the job number they provided. The degree of color shows the number of jobs they provided and the area of the sector stands for the ratio(ex. transportation GDP in Canada/ total GDP of tourism in Canada). Even though the food and beverage services provided the largest number of employment opportunities, they only occupy the 4th contribution to GDP. On the contrary, transportation and accommodation provide a relatively low number of employment opportunities but they both made a great contribution to GDP. If the audience clicks any of the 6 categories are at the inside circle, a new pie chart would appear with only this category and its subdivided activities(second-level pie chart). From the second level, we could see that air transportation income accounts for the largest share in transportation. As we already know from part 2, the restriction policies had caused a significant reduction in international traveling, whose main means of transport is air travel, so the GDP of the tourism industry is suffering a severe blow due to covid-19.
df_gdp_employment = pd.read_csv("gdp_employment.csv")
df_gdp = df_gdp_employment[df_gdp_employment['Gross domestic product (GDP) and related measures'] == 'Tourism gross domestic product (GDP) at basic prices']
df_gdp = df_gdp[['GEO', 'Industry', 'VALUE']]
df_gdp = pd.pivot(df_gdp, index="GEO", columns="Industry")
df_gdp = df_gdp['VALUE'][['Total economy','Total tourism activities']]
df_gdp = df_gdp[df_gdp.index != 'Canada']
df_employment = df_gdp_employment[df_gdp_employment['Gross domestic product (GDP) and related measures'] == 'Number of jobs']
df_employment = df_employment[['GEO', 'Industry', 'VALUE']]
df_employment = pd.pivot(df_employment, index="GEO", columns="Industry")
df_employment = df_employment[df_employment.index != 'Canada']
df_e = df_employment['VALUE']['Total tourism activities']
df_s = df_employment['VALUE']['Total tourism activities'] / df_employment['VALUE']['Total economy']
df_province = df_gdp
df_province['jobs'] = df_e
df_province['Share of total employment'] = df_s
df_province['GEO'] = df_province.index
fig = px.scatter(df_province, x='jobs', y='Share of total employment', size='Total tourism activities', color='GEO', hover_name='GEO', size_max=70, labels=dict(jobs="Number of jobs (x1,000)"))
fig.show()
df_canada = df_gdp_employment[df_gdp_employment['GEO'] == 'Canada']
df_canada_gdp = df_canada[df_canada['Gross domestic product (GDP) and related measures'] == 'Tourism gross domestic product (GDP) at basic prices']
df_canada_gdp = df_canada_gdp[['Industry', 'VALUE']]
df_canada_employment = df_canada[df_canada['Gross domestic product (GDP) and related measures'] == 'Number of jobs']
df_canada_employment = df_canada_employment['VALUE']
df_canada_employment = df_canada_employment.reset_index()['VALUE']
df_canada_gdp['jobs'] = df_canada_employment
df_canada_gdp = df_canada_gdp[(df_canada_gdp['Industry'] == 'Air transportation') |
(df_canada_gdp['Industry'] == 'Railway transportation') |
(df_canada_gdp['Industry'] == 'Water transportation') |
(df_canada_gdp['Industry'] == 'Bus transportation') |
(df_canada_gdp['Industry'] == 'Taxis') |
(df_canada_gdp['Industry'] == 'Vehicle rental') |
(df_canada_gdp['Industry'] == 'Hotels') |
(df_canada_gdp['Industry'] == 'Motels') |
(df_canada_gdp['Industry'] == 'Camping') |
(df_canada_gdp['Industry'] == 'Other accommodation') |
(df_canada_gdp['Industry'] == 'Food and beverage services') |
(df_canada_gdp['Industry'] == 'Recreation and entertainment') |
(df_canada_gdp['Industry'] == 'Travel services') |
(df_canada_gdp['Industry'] == 'Other industries')]
df_canada_gdp['Industry2'] = [*['Transportation'] * 6, *['Accommodation'] * 4, 'Food and beverage services', 'Recreation and entertainment', 'Travel services', 'Other industries']
fig = px.sunburst(df_canada_gdp, path=['Industry2', 'Industry'], values='VALUE',
color='jobs', hover_data=['Industry'],
color_continuous_scale='RdBu',
color_continuous_midpoint=np.average(df_canada_gdp['jobs'], weights=df_canada_gdp['VALUE']))
fig.show()
After learning about the tourism industry GDP composition, let's analyze the changes in the past five years, including the epidemic in 2020, and quarterly changes from 2016 to 2020, to observe the trend of changes in each tourism industry segment. Two datasets were used to create a visualization in this part--Employment generated by tourism (x 1,000)(https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3610023201) and Tourism demand in Canada, constant prices (x 1,000,000)(https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3610023001), which contain information about employment and expenditure of tourism activities in the past 5 years.
On the first try, I plotted two clustered bar graphs to visualize employment opportunities and expenditures for each tourism activities respectively. Although the audience could observe the change during the 5 years by comparing the bar length, it is not intuitive and the information the two graphs conveyed is limited. So changes on variable and type of graph are needed.
df_season_employment = pd.read_csv("season_employment.csv")
df_emp = df_season_employment[(df_season_employment['Seasonal adjustment'] == 'Unadjusted')]
df_emp = df_emp[df_emp['REF_DATE'] >= '2016-01']
df_emp = df_emp[(df_emp['Activities'] == 'Transportation') |
(df_emp['Activities'] == 'Accommodation') |
(df_emp['Activities'] == 'Food and beverage services') |
(df_emp['Activities'] == 'Other tourism activities')]
df_emp = df_emp[['REF_DATE', 'VALUE', 'Activities']]
df_season_expenditures = pd.read_csv("season_expenditures.csv")
df_exp = df_season_expenditures[(df_season_expenditures['Seasonal adjustment'] == 'Unadjusted') & (df_season_expenditures['Categories'] == 'Tourism demand')]
df_exp = df_exp[df_exp['REF_DATE'] >= '2016-01']
df_exp = df_exp[(df_exp['Expenditures'] == 'Transportation') |
(df_exp['Expenditures'] == 'Accommodation') |
(df_exp['Expenditures'] == 'Food and beverage services') |
(df_exp['Expenditures'] == 'Other tourism commodities')]
df_exp = df_exp[['REF_DATE', 'VALUE', 'Expenditures']]
fig = px.bar(df_emp, x="REF_DATE", y="VALUE", color="Activities", labels=dict(REF_DATE="Season", VALUE="Number of Jobs (x1,000)"))
fig.show()
fig = px.bar(df_exp, x="REF_DATE", y="VALUE", color="Expenditures", labels=dict(REF_DATE="Season", VALUE="Expenditures (x 1,000,000)", Expenditures="Activities"))
fig.show()
On the second try, I decided to create a new variable about the rate of change. I used the percent change function in python, that is, (value of this quarter - the value of last quarter) / value of last quarter, to compare differences between adjacent quarters. A positive value illustrates that there is positive growth compared with the last quarter while a negative value means negative growth. To visualize the trend, I chose a line graph. From the line graph below we could know that the pattern of change from year to year is similar, with positive growth in summer, negative growth in winter in terms of expenditures.
Although the improved graph had conveyed more information than the previous one, the difference of the current year compared with the previous year cannot be well observed. So further improvement is needed.
df_season_emp = df_emp
df_season_emp = df_season_emp.pivot(index = "Activities", columns="REF_DATE", values="VALUE" )
df_season_rate_emp = df_season_emp.pct_change(axis='columns')
df_season_rate_emp = df_season_rate_emp.unstack().reset_index(name='value')
df_season_rate_emp.dropna(axis=0, inplace=True)
df_season_exp = df_exp
df_season_exp = df_season_exp.pivot(index = "Expenditures", columns="REF_DATE", values="VALUE" )
df_season_rate_exp = df_season_exp.pct_change(axis='columns')
df_season_rate_exp = df_season_rate_exp.unstack().reset_index(name='value')
df_season_rate_exp.dropna(axis=0, inplace=True)
fig = px.line(df_season_rate_emp, x="REF_DATE", y="value", color='Activities', labels=dict(REF_DATE="Season", value="Season over season employment growth rate"))
fig.update_traces(mode='markers+lines')
fig.update_xaxes(
rangeslider_visible=True,
rangeselector=dict(
buttons=list([
dict(step="all"),
dict(count=16, label="recent 1 year", step="month", stepmode="backward")
])
)
)
fig.show()
fig = px.line(df_season_rate_exp, x="REF_DATE", y="value", color='Expenditures', labels=dict(REF_DATE="Season", value="Season over season expenditures growth rate", Expenditures="Activities"))
fig.update_traces(mode='markers+lines')
fig.update_xaxes(
rangeslider_visible=True,
rangeselector=dict(
buttons=list([
dict(step="all"),
dict(count=16, label="recent 1 year", step="month", stepmode="backward")
])
)
)
fig.show()
In my third try, instead of comparing the adjacent quarter value, I chose to compare the same quarter in the adjacent year--(value of the quarter in this year- the value of the same quarter in last year) / value of the same quarter in last year. From the line graph, we could know that from 2017 to 2019 there was no obvious change for the same quarter in terms of tourism expenditures, but in 2020 there was a huge decrease(negative growth). As for employment, accommodation experienced the most serious loss in 2020 and the number of transportation-related jobs has a minimum reduction. As for expenditures, the transportation expenditures experienced the most serious loss in 2020, and loss of food and beverage services is relatively smaller compared with other segments. The reason for this may be that air transportation accounts for most of the tourism GDP.
df_year_rate_emp = df_season_emp.diff(4, axis='columns') / df_season_emp
df_year_rate_emp = df_year_rate_emp.unstack().reset_index(name='value')
df_year_rate_emp.dropna(axis=0, inplace=True)
df_year_rate_exp = df_season_exp.diff(4, axis='columns') / df_season_exp
df_year_rate_exp = df_year_rate_exp.unstack().reset_index(name='value')
df_year_rate_exp.dropna(axis=0, inplace=True)
fig = px.line(df_year_rate_emp, x="REF_DATE", y="value", color='Activities', labels=dict(REF_DATE="Season", value="Year over year employment growth rate"))
fig.update_traces(mode='markers+lines')
fig.update_xaxes(
rangeslider_visible=True,
rangeselector=dict(
buttons=list([
dict(step="all"),
dict(count=16, label="recent 1 year", step="month", stepmode="backward")
])
)
)
fig.show()
fig = px.line(df_year_rate_exp, x="REF_DATE", y="value", color='Expenditures', labels=dict(REF_DATE="Season", value="Year over year expenditures growth rate", Expenditures="Activities"))
fig.update_traces(mode='markers+lines')
fig.update_xaxes(
rangeslider_visible=True,
rangeselector=dict(
buttons=list([
dict(step="all"),
dict(count=16, label="recent 1 year", step="month", stepmode="backward")
])
)
)
fig.show()
In this project, the effectiveness of traveling restrictions(including pie charts & bar graph), change trends for international travelers (including line graph & bar graph), the tourism industry in Canada(including interactive bubble chart & two-level pie chart), and change trend of the tourism industry in Canada(including 3 iterations with interactive clustered bar graph, line graphs) had been explored for the whole story. Since the audience is the general public, the complexity of visualizations follows a "shallow to deep" manner. For the first two-part the graphs contain less information than graphs in the third part and thus are easy to read even without text. As the audience becomes more familiar with the story background, they can understand more complex graphs with interactive features in part 3. The text and legends for the graph in part 3 also help the audience to better understand the story. To avoid bias, most of the data were derived from Statistics Canada, which ensures data reliability and fairness. When designing the story, I not only tried to find decreasing trend(although it is the truth) but also tried to find any "abnormal" pattern, like the truck and crew numbers in part 1, to avoid personal bias.
In conclusion, the restriction policies on traveling had reduced imported cases and international travelers effectively. The number of truck drivers and crew recovered quickly after the restriction was implemented due to increasing e-commerce activities and the role it played in the supply chain. Transportation in the tourism industry had a great contribution to tourism GDP and it also suffered the most due to restriction. As for employment, accommodation experienced the most serious loss in 2020 and the number of transportation-related jobs has a minimum reduction. People in tourism industries who lose the job due to covid-19 could consider jobs in driving and e-commerce.
REF: